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DETAILED ACTION 

Claim Rejections - 35 USC § 102 

1. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

2. Claims 1-4 are rejected under 35 U.S.C. 102(e) as being anticipated by Page et al. (6,175,821). 
The table below summarizes limitations of this applications and parts of Page et al. that "read 
on" these limitations. 



Claim# 


Limitations 


Page et al. 


l 


A method for converting text to concatenated voice by 
utilizing a digital voice library and a set of playback rules, 
the digital voice library including a plurality of speech 
items including words and syllables and a corresponding 
plurality of voice recordings wherein each speech item 
corresponds to at least one available voice recording, the 
method comprising: 

training the digital voice library to associate each 
syllable speech item with a literal text syllable of the 


The system contains ROM (3, FIG. 1) that 
stores recordings of phrase used for 
messages outputs. In addition, speech 
converter (4, FIG. 1) has a diphone 
dictionary for converting text to speech. 

Inherently, for speech synthesis, this 
dictionary has to be trained (or initially 
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particular syllable speech item. 


populated) in order to create a mapping 
between text syllables and dyphones. 


2 


The method of claim 1 further comprising: 

receiving a sequence of words including known 
words that correspond to word speech items in the digital 
voice library and including unknown words 

converting each known word into a word speech item 
in accordance with the digital voice library 

and for each unknown word, parsing the unknown 
word to determine a sequence of literal text syllables and 
converting the text syllable sequence to a sequence of 
syllable speech items in accordance with the digital voice 
library. 


The system receives a text message (Col. 
4, lines 60-63), then synthesizes the 
message using diphone dictionary of 
speech synthesizer (Col. 4, lines 63-66). In 
addition, invariable (known) portions of 
the text message are converted directly to 
preset recordings by message generator 
(Col. 5, lines 42-45) 


3 


The method of claim 2 further comprising: 

converting the sequence of word speech items and 
syllable speech items into a sequence of voice recordings 
in accordance with the set of playback rules. 


The variable and invariable portions are 
pre-processed in order to produce natural- 
sounding message (Col. 5, lines 36-45) 


4 


The method of claim 3 further comprising: 

generating voice data based on the sequence of 
voice recordings by concatenating adjacent recordings in 
the sequence of voice recordings. 


The variable and invariable portions of the 
message are concatenated together into a 
unified recording by message generator 
(Col. 5, 45-49) 



Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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4. Claim 5 is rejected under 35 U.S.C. 103(a) as being obvious over Page et al. in view of 
Karalli et al. (5,668,926). 

As per claim 5, Page et al. discloses a speech converter that has a diphone dictionary for 
converting text to speech (4, FIG. 1). 

Page et al. do not disclose training the dictionary by "utilizing a neural network having an 
input and an output to train the digital voice library with the neural network receiving the literal 
text syllable of the particular syllable speech item as input and with the neural network 
outputting the associated syllable speech item." 

Karalli et al teach the use of neural networks to train the text-to-speech system (Col. 2, 
lines 21-33). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made to modify Page et al. as taught in Karalli et al., in order to populate the diphone 
dictionary in the efficient manner and also provide an effective method of resolving ambiguous 
inputs to the dictionary. 

5. Claim 6 is rejected under 35 U.S.C. 103(a) as being obvious over Page et al. 

Page et al. do not disclose training the digital library by "manually associating each 
syllable speech item with the literal text syllable of the particular syllable speech item." 

The examiner takes official notice that the method of manually populating look-up 
dictionaries is well-known to the practitioners in computer arts. 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made to modify Page et al. by manually associating each literal text syllable with the 
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corresponding syllable speech item since this would be the most straightforward and "brute 
force" method of training the dictionary. 

6. Claims 7-10 are rejected under 35 U.S.C 103(a) as being obvious over Page et al. in view of 
Lin et al. (6,076,060) 

As per claim 7, Page et al. discloses a speech converter that has a diphone dictionary for 
converting text to speech (4, FIG. 1). 

Page et al. do not disclose "parsing the unknown word to determine a sequence of literal 
text syllables and known words, and converting the sequence to a sequence of syllable speech 
items and word speech items in accordance with the digital voice library. " 

Lin et al. teach parsing the unknown word into a sequence of syllables and word speech 
items (Col. 6, line 56-60) that are later converted to speech sounds (16, FIG. 2) 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made to modify Page et al. as taught in Lin et al, in order to eventually create a dyphone 
representation of each unknown word so it could be synthesized by speech synthesizer that 
requires an input of dyphones to produce the output sound. 

As per claim 8, Page et al. do not disclose parsing that comprises: 

• parsing the unknown word in the forward direction to determine any known words 

• parsing the unknown word in the reverse direction to determine any known words 
where any known words overlap, selecting the larger word 

• parsing the unknown word in the forward direction to determine any literal text syllables 

• parsing the unknown word in the reverse direction to determine any literal text syllables. 
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Lin et al. teach parsing the words in from left-to-right and from right-to-left in order to 
determine sub-words and literal text symbols (Col. 3, lines 45-53). Also, the large words are 
chosen first (Col. 3, lines 55-58). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made to modify Page et al. as taught in Lin et al., in order to create an efficient parsing 
technique that more closely matches the way words are parsed when spoken by humans. This 
method of parsing is less likely to miss important sub-stings in unknown words. 

As per claim 9 and 10, Page et al. discloses the calculation and adjustment of pitch of the 
generated message using transition signals and appropriate voice recordings (Col. 2, lines 32-48) 

7. Claim 1 1 is rejected under 35 U.S.C. 103(a) as being obvious over Page et al. in view of 
Carteret al. (6,600,814) 

Page does not disclose "for each unknown word, after the unknown word is parsed, 
storing results of the parsing in the digital voice library so that a next encounter with the same 
unknown word may be handled more efficiently." 

Carter et al. teaches storing processed portions of text in the text-to-speech system to 
alleviate the load on the system (Col 2, lines 30-39). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made to modify Page et al. as taught by Carter et al. to store the parsed results of unknown 
words so that next attempts with the same words were handled more efficiently. This concept of 
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"caching" data for future reference is extremely well-known and widely used in the art of 



9. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure. 

Coorman et al. (6,665,641) teaches concatenating synthesizer. 

Sharman (5,949,961) teaches word syllabification method for text-to-speech systems. 

Conkie (6,173,263) teaches concatenating text-to-speech system that uses prosody analysis. 

10. Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Dmitry Brant whose telephone number is (703) 305-8954. The examiner 
can normally be reached on Mon. - Fri. (8:30am - 5pm). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Talivaldis Ivars Smits can be reached on (703) 306-301 1. The fax phone number for 
the organization where this application or proceeding is assigned is (703) 872-9306. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to Tech Center 2600 receptionist whose telephone number is (703) 305- 4700. 



computing. 



Conclusion 
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